S&P 500 Index Time Series EDA
Source: Exploratory Data Analysis of Time Series Data
Read the S&P 500 Index Adjusted Closing, Daily return (%) and volatility dataset.
spx <- read_rds('../data/spx_return_vol_tbl.rds')
spx# A tibble: 2,526 × 4
date adjusted return volatility
<date> <dbl> <dbl> <dbl>
1 2013-01-03 1459. -0.209 0.209
2 2013-01-04 1466. 0.487 0.487
3 2013-01-07 1462. -0.312 0.312
4 2013-01-08 1457. -0.324 0.324
5 2013-01-09 1461. 0.266 0.266
6 2013-01-10 1472. 0.760 0.760
7 2013-01-11 1472. -0.00475 0.00475
8 2013-01-14 1471. -0.0931 0.0931
9 2013-01-15 1472. 0.113 0.113
10 2013-01-16 1473. 0.0197 0.0197
# … with 2,516 more rows
Plot time series
spx %>%
select(date, adjusted) %>%
plot_time_series(date, adjusted,
.title = 'S&P 500 Index Daily Adjusted Closing Price (Jan 2013 - Jan 2023 (partial)')spx %>%
plot_time_series(date, return,
.title = 'S&P 500 Index Daily Adjusted Closing Return Percentage (Jan 2013 - Jan 2023 (partial))')spx %>%
plot_time_series(date, volatility,
.title = 'S&P 500 Index Daily Volatility Percentage (Jan 2013 - Jan 2023 (partial))')ACF/PACF Diagnostics
spx %>%
plot_acf_diagnostics(date, adjusted)spx %>%
plot_acf_diagnostics(date, return)spx %>%
plot_acf_diagnostics(date, volatility)Seasonal Diagnostics
spx %>%
plot_seasonal_diagnostics(
.date_var = date,
.value = adjusted
)spx %>%
plot_seasonal_diagnostics(
.date_var = date,
.value = return
)spx %>%
plot_seasonal_diagnostics(
.date_var = date,
.value = volatility
)Anomaly Diagnostics
spx %>%
plot_anomaly_diagnostics(
.date_var = date,
.value = adjusted,
.alpha = 0.05,
.max_anomalies = 0.03
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>%
plot_anomaly_diagnostics(
.date_var = date,
.value = return,
.alpha = 0.05,
.max_anomalies = 0.03
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>%
plot_anomaly_diagnostics(
.date_var = date,
.value = volatility,
.alpha = 0.05,
.max_anomalies = 0.03
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
Seasonal Decomposition
spx %>%
plot_stl_diagnostics(
.date_var = date,
.value = adjusted
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>%
plot_stl_diagnostics(
.date_var = date,
.value = return
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>%
plot_stl_diagnostics(
.date_var = date,
.value = volatility
)frequency = 5 observations per 1 week
trend = 64 observations per 3 months
Heteroskedasticity (variance not uniform across the time series) test
Using bptest from the lmtest package
Hypothesis test:
Null hypothesis (H0): Time series variance is uniform
Alternate hypothesis (Ha): Time series variance is not uniform
lm_model_adj <- lm(adjusted ~ as.numeric(date), data = spx)
bptest(lm_model_adj, data = spx)
studentized Breusch-Pagan test
data: lm_model_adj
BP = 469.1, df = 1, p-value < 2.2e-16
Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).
lm_model_ret <- lm(return ~ as.numeric(date), data = spx)
bptest(lm_model_ret, data = spx)
studentized Breusch-Pagan test
data: lm_model_ret
BP = 35.784, df = 1, p-value = 2.205e-09
Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).
lm_model_vol <- lm(volatility ~ as.numeric(date), data = spx)
bptest(lm_model_ret, data = spx)
studentized Breusch-Pagan test
data: lm_model_ret
BP = 35.784, df = 1, p-value = 2.205e-09
Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).
Stationarity Test
What is the definition of a stationary time series?
According to the textbook Chapter 8.1 - Stationarity and differencing, “a stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.”
Is there a test to detect time series stationarity?
Yes. The traditional test is the ADF (Augmented Dick Fuller) test.
Hypothesis test:
Null hypothesis (H0): Time series is non-stationary
Alternate hypothesis (Ha): Time series is stationary
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
adf.test(spx$adjusted)
Augmented Dickey-Fuller Test
data: spx$adjusted
Dickey-Fuller = -2.6125, Lag order = 13, p-value = 0.319
alternative hypothesis: stationary
Since the p-value > 0.05, we cannot reject the null hypothesis. Therefore, the time series is non-stationary.
adf.test(spx$return)Warning in adf.test(spx$return): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: spx$return
Dickey-Fuller = -13.849, Lag order = 13, p-value = 0.01
alternative hypothesis: stationary
Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., the time series is stationary.
adf.test(spx$volatility)Warning in adf.test(spx$volatility): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: spx$volatility
Dickey-Fuller = -7.5903, Lag order = 13, p-value = 0.01
alternative hypothesis: stationary
Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., the time series is stationary.